Lecture 8: So Let’s Build a Reporting Engine

A Big, Messy Survey: Continued

Setting: A real world marketing analytics project with a big company.

The Challenges:

Reporting Engines

A Reminder: States of Engagement

A Reminder: the Melted Data

Static Reports

Dynamic Reporting

Advantages of Dynamic Reporting

Building a Reporting Engine

The shiny Package

Software Development in RMarkdown

RMarkdown simplifies all of this as much as possible. You can focus more on everything else that’s important:

So Let’s Build a Reporting Engine!

Over the course of today’s lecture, we will build a reporting engine for the Marketing Analytics project. The resulting software can then be used as a prototype for the reporting engines you might build in the future.

Piece 1: Setting Up a Flexdashboard

Or you can start from a completely blank .Rmd file.

The High Level Parameters

Piece 1’s Output

Piece 2: RMarkdown’s Setup

library(flexdashboard)
library(shiny)
library(rmarkdown)
library(knitr)
library(Hmisc)
library(DT)

library(data.table)
assignInNamespace(x = "cedta.override", value = c(data.table:::cedta.override,"rmarkdown"), ns = "data.table")

opts_chunk$set(echo = FALSE, comment="", warning = FALSE, message = FALSE, tidy.opts=list(width.cutoff=55), tidy = TRUE)

Components of the Setup

The Technical Workaround

library(data.table)
assignInNamespace(x = "cedta.override", value = c(data.table:::cedta.override,"rmarkdown"), ns = "data.table")

Output from Pieces 1-2

Nothing in this step changed the output in any way.

Additional Functions

Additional Variables

id.name <- "id"
age.name <- "age"
gender.name <- "gender"
income.name <- "income"
region.name <- "region"
persona.name <- "persona"

product.name <- "Product"
awareness.name <- "Awareness"
consideration.name <- "Consideration"
consumption.name <- "Consumption"
satisfaction.name <- "Satisfaction"
advocacy.name <- "Advocacy"

bp.pattern <- "BP_"

age.group.name <- "age_group"
income.group.name <- "income_group"

cuts.age <- c(18, 35, 50, 65, 120)
cuts.income <- 1000* c(0, 25, 50, 75, 100, 200)

Piece 3: Reading the Data

We have our usual way of reading in a .csv file:

dat <- fread(input = "Simulated Marketing Data -- Melted.csv", verbose = FALSE)

More Variables

dat[, eval(age.group.name) := cut2(x = get(age.name), cuts = cuts.age)]
dat[, eval(income.group.name) := cut2(x = get(income.name), cuts = cuts.income)]
dat[, eval(satisfaction.name) := get(satisfaction.name) / 10]

unique.age.groups <- dat[, sort(unique(get(age.group.name)))]
unique.genders <- dat[, sort(unique(get(gender.name)))]
unique.income.groups <- dat[, sort(unique(get(income.group.name)))]
unique.regions <- dat[, sort(unique(get(region.name)))]
unique.personas <- dat[, sort(unique(get(persona.name)))]

unique.products <- dat[, unique(get(product.name))]

respondent.variables <- c(age.group.name, gender.name, income.group.name, region.name, persona.name)
states.of.engagement <- c(awareness.name, consideration.name, consumption.name, satisfaction.name, advocacy.name)
bp.traits <- names(dat)[grep(pattern = bp.pattern, x = names(dat))]

Output From Pieces 1-3

Nothing in this step changed the output in any way.

Piece 4: Tabs

Deciding on the Tabs

We will summarize the report with the following tabs:

RMarkdown Code for the Tabs

A Tab’s Layout

Output From Pieces 1-4

Piece 5: The Introduction Tab

This tab is meant to provide a brief overview of the data:

Note that this tab is written just like a regular RMarkdown report. There is not any dynamic content here.

Output from Pieces 1-5

Piece 6: The Respondents Tab

This tab is meant to summarize the person-specific variables in the data:

Too Much Information

An Input Panel

inputPanel(
  selectInput(inputId="respondent_variable", label = "Select Variable:", choices = respondent.variables, selected = respondent.variables[1]),
  checkboxInput(inputId = "respondent_show_percentages", label = "Show Percentages", value = TRUE)
)

Two Inputs Requested

The panel is asking for two pieces of information:

Input Mechanics

input <- list(respondent_variable = respondent.variables[1], respondent_show_percentages = TRUE)
print(input)
$respondent_variable
[1] "age_group"

$respondent_show_percentages
[1] TRUE

Yes/No Menus with checkboxInput

The first line of the input panel selects the variable to summarize:

checkboxInput(inputId = "respondent_show_percentages", label = "Show Percentages", value = TRUE)

Many Other Kinds of Input Functions:

[http://shiny.rstudio.com/images/shiny-cheatsheet.pdf]

Output from Pieces 1-6

Piece 7: The Respondents Output

In this case, we want to:

Reactive Content with renderPlot

renderPlot({
  tab <- percentage.table(x = dat[get(product.name) == get(product.name)[1], get(input$respondent_variable)])
  barplot(height = tab, space=0.01, las = 1, main = input$respondent_variable, ylab = "Percentage", xlab = input$respondent_variable, ylim = c(0, 1.2*max(tab, na.rm = TRUE)), col = "dodgerblue")
  
  if(input$respondent_show_percentages == TRUE){
    space_val = 0
    text(x = -0.4 + 1:length(tab) * (1+space_val), y = tab, labels = sprintf("%.1f%%", tab), pos = 3)
  }
})

Many Other Kinds of Rendering Functions:

[http://shiny.rstudio.com/images/shiny-cheatsheet.pdf]

Output from Pieces 1-7

Piece 8: Product Information

A Single Bar Graph

Visual Problems with Numerous Graphs

It can be difficult to find one set of parameters that optimizes the display for all of the potential graphs that a reporting engine will generate. It can help to create the capability for the user to customize the visual display to solve these problems:

You may want to focus attention:

A Plan for an Input Panel

The Product Information Input Panel

inputPanel(
  selectInput(inputId = "product_info_engagement_state", label = "Select State of Engagement:", choices = states.of.engagement, selected = states.of.engagement[1]),
  checkboxInput(inputId = "product_info_decreasing", label = "Sorted", value=TRUE),
  checkboxInput(inputId = "product_info_show_percentages", label = "Show Percentages", value = TRUE)
,
  sliderInput(inputId = "product_info_min_threshold", label = "Show Products Above", min = 0, max = 100, value = 20, step = 5),
  sliderInput(inputId = "product_info_names_magnification", label = "Magnify Product Names", min = 0.4, max = 1.4, value = 1, step = 0.1)
)

Product Information’s Menus

Reactive Content for Product Info

Product Information’s Reactive Content

renderPlot({
  rates <- dat[, .(Mean = 100*mean(get(input$product_info_engagement_state), na.rm=TRUE)/max(get(input$product_info_engagement_state), na.rm = TRUE)), by = product.name]

  if(input$product_info_decreasing == TRUE){
    setorderv(x = rates, cols = "Mean", order = -1)
  }  
  barplot(height = rates[Mean > input$product_info_min_threshold,  Mean], names.arg = rates[Mean > input$product_info_min_threshold, get(product.name)], space=0.01, las = 1, main = input$product_info_engagement_state, ylab = sprintf("Rate of %s", input$product_info_engagement_state), cex.names = input$product_info_names_magnification, ylim = c(-100, 120), xaxt = "n", axes = F, col = "dodgerblue")
  axis(side = 2, at = 20*(0:5), las = 2)
  
  text(x = -0.5 + 1.02*1:rates[Mean > input$product_info_min_threshold, .N], y = -15, labels = rates[Mean > input$product_info_min_threshold, get(product.name)], srt = 45, cex = input$product_info_names_magnification, pos = 2)
  
  if(input$product_info_show_percentages == TRUE){
    space_val = 0
    text(x = -0.4 + 1:rates[Mean > input$product_info_min_threshold, .N] * (1+space_val), y = rates[Mean > input$product_info_min_threshold, Mean], labels = sprintf("%.1f%%", rates[Mean > input$product_info_min_threshold, Mean]), pos = 3)

  }
})

Output for Pieces 1-8

Piece 9: Brand Perceptions

Brand Perception’s Input Panel

inputPanel(
  selectInput(inputId="bp_product", label = "Select Brand:", choices = unique.products, selected = unique.products[1]),
  selectInput(inputId="bp_trait", label = "Select Perception:", choices = bp.traits, selected = bp.traits[1]),
  checkboxInput(inputId = "bp_show_percentages", label = "Show Percentages", value = TRUE)
)

Brand Perceptions’ Menus

Reactive Content for Brand Perceptions

Brand Perceptions’ Reactive Content

renderPlot({
  tab <- percentage.table(x = dat[get(product.name) == input$bp_product, get(input$bp_trait)])
  barplot(height = tab, space=0.01, las = 1, main = sprintf("%s for %s", input$bp_trait, input$bp_product), ylab = "Percentage", xlab = input$bp_trait, ylim = c(0, 1.2*max(tab, na.rm=TRUE)), col = "dodgerblue")
  if(input$bp_show_percentages == TRUE){
    space_val = 0
    text(x = -0.4 + 1:length(tab) * (1+space_val), y = tab, labels = sprintf("%.1f%%", tab), pos = 3)
  }
})

Output for Pieces 1-9

Piece 10: Engagement Plots

Engagement Plots’ Input Panel

inputPanel(
  selectInput(inputId="ep_product", label = "Select Brand:", choices = unique.products, selected = unique.products[1]),
  selectInput(inputId="ep_state", label = "Select State of Engagement:", choices = states.of.engagement, selected = states.of.engagement[1]),
  selectInput(inputId="ep_subgroup", label = "Select Subgroup:", choices = c("All", respondent.variables), selected = "All"),
  checkboxInput(inputId = "ep_show_percentages", label = "Show Percentages", value = TRUE)
)

Engagement Plots’ Menus

Reactive Content for Engagement Plots

Variables:

Subgrouping or Not

Engagement Plots’ Reactive Content

renderPlot({
  if(input$ep_subgroup == "All"){
    tab <- dat[get(product.name) == input$ep_product, .(Mean = 100*mean(get(input$ep_state), na.rm=TRUE))]
    tab[, All := "All respondents"]
  }
  else{
    tab <- dat[get(product.name) == input$ep_product, .(Mean = 100*mean(get(input$ep_state), na.rm=TRUE)), keyby = eval(input$ep_subgroup)]
  }
  
  barplot(height = tab[, Mean], names.arg = tab[, get(input$ep_subgroup)], space=0.01, las = 1, main = sprintf("%s of %s", input$ep_state, input$ep_product), ylab = "Percentage", xlab = input$ep_subgroup, ylim = c(0, 1.2 * max(tab[, Mean], na.rm = TRUE)), col = "dodgerblue")
  
  if(input$ep_show_percentages == TRUE){
    space_val = 0
    text(x = -0.4 + 1:tab[, .N] * (1+space_val), y = tab[, Mean], labels = sprintf("%.1f%%", tab[, Mean]), pos = 3)
  }
})

Output for Engagement Plots

Piece 11: Engagement Models Menus

Now we are ready for the most important piece of content. Not coincidentally, it also has the most features.

In the past lecture, we used regression techniques to constructure models for each product’s states of engagement. These multivariable models could take all of the respondent-specific and brand-specific variables into account.

However, now we have a twist: the client is asking for the flexibility to fit these models in any combination of products and any combination of subgroups of the respondent-specific variables.

Combinations of Subgroups

Muliple selections in a Dropdown Menu

Engagement Model’s Input Panel

inputPanel(
  selectInput(inputId="em_state", label = "State of Engagement:", choices = states.of.engagement, selected = states.of.engagement[1]),
  selectInput(inputId="em_product", label = "Brand", choices = unique.products, selected = unique.products[1], multiple = TRUE),
  selectInput(inputId="em_inputs", label = "Choose Inputs:", choices = c(age.group.name, gender.name, region.name, income.group.name, persona.name, bp.traits), selected = c(age.group.name, gender.name, region.name, income.group.name), multiple = TRUE),
  selectInput(inputId="em_age_group", label = "Age", choices = unique.age.groups, selected = unique.age.groups, multiple = TRUE),
  selectInput(inputId = "em_gender", label = "Gender", choices = unique.genders, selected = unique.genders, multiple = TRUE),
  selectInput(inputId = "em_income_group", label = "Income", choices = unique.income.groups, selected = unique.income.groups, multiple = TRUE),
  selectInput(inputId = "em_region", label = "Region", choices = unique.regions, selected = unique.regions, multiple = TRUE),
  selectInput(inputId = "em_persona", label = "Persona", choices = unique.personas, selected = unique.personas, multiple = TRUE)
)

Output for Engagement Models Input Panel

Piece 12: Reactive Engagement Models

A Lot of Models

Number of Models = Number of States of Engagement \(* \displaystyle\prod_{k=1}^{K}(2^{{n_k}} - 1)\).

Counting the Number of Models

num_subgroups <- function(x){
  return(2^(length(x)) -1)
}
num_models <- length(states.of.engagement) * num_subgroups(unique.products) * num_subgroups(unique.age.groups) * num_subgroups(unique.genders) * num_subgroups(unique.income.groups) * num_subgroups(unique.regions) * num_subgroups(unique.personas) * 
num_subgroups(bp.traits)

print(sprintf("There are %e possible models.", num_models))
[1] "There are 1.131837e+17 possible models."

That is something like 113.2 quadrillion potential models. Yikes!

113.2 Quadrillion? Really?

A Broader Problem

Dynamic Formulae

The Alternatives

A Plan for Reactive Content

Engagement Models’ Reactive Content

renderDataTable({
  subdat <- dat[get(product.name) %in% input$em_product & get(age.group.name) %in% input$em_age_group & get(gender.name) %in% input$em_gender & get(income.group.name) %in% input$em_income_group & get(region.name) %in% input$em_region & get(persona.name) %in% input$em_persona]
  
  if(input$em_state == satisfaction.name){
    model.type <- "linear"
  }
  if(input$em_state != satisfaction.name){
    model.type <- "logistic"
  }
  
  res <- fit.model(dt = subdat, outcome.name = input$em_state, input.names = input$em_inputs, model.type = model.type)
  
  datatable(data = res)
})

Pieces 1-12’s Output

What About Those 113.2 Quadrillion Models?

Coaching Your Users

Lo and Behold, a Reporting Engine

Simple Extensions of Reporting Engines

Training Your Team

Getting Feedback

The Importance of data.table

Tremendous Skills